This Script analyizes absorption and flouresence data from all observations in my study of C. Chamissoi

Chondracanthus chamissoi Phycoerythrin pigment

Scripts are sourced from the process_plate_run R script and called upon in this Markdown. You must store the script in the working directory as well as all input and output files. Go ahead and pick the directory you want to use.

source("process_plate_run.R") #Contains all the functions I have built
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats   1.0.0     ✔ readr     2.1.5
## ✔ ggplot2   3.5.1     ✔ stringr   1.5.1
## ✔ lubridate 1.9.3     ✔ tibble    3.2.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## 
## Attaching package: 'zoo'
## 
## 
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## 
## 
## 
## Attaching package: 'plotly'
## 
## 
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## 
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## 
## The following object is masked from 'package:graphics':
## 
##     layout
## 
## 
## here() starts at G:/My Drive/ACES/Dissy/analysis/PE
## 
## # Attaching packages: easystats 0.7.4
## ✔ bayestestR  0.16.0   ✔ correlation 0.8.7 
## ✔ datawizard  1.1.0    ✔ effectsize  1.0.0 
## ✔ insight     1.3.0    ✔ modelbased  0.11.0
## ✔ performance 0.14.0   ✔ parameters  0.26.0
## ✔ report      0.6.1    ✔ see         0.11.0
## 
## 
## 
## Attaching package: 'ggpubr'
## 
## 
## The following objects are masked from 'package:datawizard':
## 
##     mean_sd, median_mad

WE FIRST LOAD THE INPUT FILES. Our project has three phycoerythrin and three complimentary fluoresence data sheets. We also have three spreadsheets for sample weights, and a datasheet for our replicates. We are going to upload all xlsx files and convert them into dataframe objects, retaining their file names using the load_excel_files function.

load_excel_files("Input PE")
## Finished loading 10 Excel files from G:/My Drive/ACES/Dissy/analysis/PE/Input PE

STEP 1: CONVERT THE DATA INTO A USABLE FORMAT

The tidy_all wrapper function and it’s subordinated function tidy_and_correct processes spectral absorbency data from a multi-wavelength microplate reader assay to quantify phycoerythrin (PE) content in biological samples. It performs the following key steps:

  1. Data Import and Cleaning:
    Loads raw data from an Excel file (second sheet), removes header rows, and fills missing row identifiers for well labels.

  2. Data Reshaping:
    Organizes the spectral data from multiple wavelengths into a tidy format where each row corresponds to a single well and each column to a specific wavelength measurement.

  3. Blank Correction:
    Averages absorbance values from specified blank wells (e.g., A01, A02, A03) and subtracts these values from all sample wells to correct for background signal.

  4. Quality Filtering:
    Removes any samples with negative absorbance values at any measured wavelength, tracking and reporting which samples were removed and for which wavelength(s). Compares absorbance at X565 and X564.


FLUORESENCE AND ABSORBANCE ANALYSIS

🔄 Data Preparation and Tidy Processing

# Define blank sample wells for each experimental run
blanks1 <- "A01"
blanks2 <- c("A01", "A02", "A03")
blanks3 <- c("H09", "H10", "H11")

# Create named lists of raw dataframes and corresponding blanks
listPE <- list(PE1 = PE1, PE2 = PE2, PE3 = PE3)
listFluor <- list(Fluor1 = Fluor1, Fluor2 = Fluor2, Fluor3 = Fluor3)
listblank <- list(blanks1, blanks2, blanks3)

# Clean and structure all datasets using a custom `tidy_all()` function
tidy_all(listPE, listblank)
## Processing: PE1 with blanks: A01
## Removed rows due to negative values:
## [1] "A01" "C01" "D07" "E01" "F01" "F07" "G01" "H07"
## Saved: PE1_tidy to global environment.
## Processing: PE2 with blanks: A01, A02, A03
## Removed rows due to negative values:
##  [1] "A02" "A03" "A07" "A09" "A11" "B03" "D07" "G02" "G03" "G07"
## Saved: PE2_tidy to global environment.
## Processing: PE3 with blanks: H09, H10, H11
## Removed rows due to negative values:
## [1] "D07" "D08" "G01" "H11"
## Saved: PE3_tidy to global environment.
tidy_all(listFluor, listblank)
## Processing: Fluor1 with blanks: A01 
## Saved: Fluor1_tidy to global environment.
## Processing: Fluor2 with blanks: A01, A02, A03
## Removed rows due to negative values:
## [1] "A02"
## Saved: Fluor2_tidy to global environment.
## Processing: Fluor3 with blanks: H09, H10, H11
## Removed rows due to negative values:
##  [1] "E06" "E07" "E08" "E09" "E10" "E11" "E12" "F01" "F04" "F05" "F06" "F07"
## [13] "F08" "F09" "F10" "F11" "F12" "G02" "G03" "G04" "G06" "G07" "G08" "G09"
## [25] "G10" "G11" "G12" "H01" "H02" "H03" "H04" "H05" "H06" "H07" "H08" "H09"
## [37] "H10" "H12"
## Saved: Fluor3_tidy to global environment.

All tidied up now! ✅ ## 📊 Summary Statistics Table

STEP 2: REVIEW ABSORBANCE

# Create named summaries of cleaned fluorescence and absorbance data
summaries <- list(
  Fluor1 = summary(Fluor1_tidy$Xred),
  Fluor2 = summary(Fluor2_tidy$Xred),
  Fluor3 = summary(Fluor3_tidy$Xred),
  PE1    = summary(PE1_tidy$X565),
  PE2    = summary(PE2_tidy$X565),
  PE3    = summary(PE3_tidy$X565)
)

# Combine into a single summary table
all_stats <- unique(unlist(lapply(summaries, names)))
summary_table <- data.frame(
  Statistic = all_stats,
  do.call(cbind, lapply(summaries, function(s) {
    s_named <- as.list(s)
    sapply(all_stats, function(k) s_named[[k]] %||% NA)
  }))
)
colnames(summary_table)[-1] <- names(summaries)
summary_table[,-1] <- round(summary_table[,-1], 2)
print(summary_table)
##         Statistic   Fluor1   Fluor2   Fluor3  PE1  PE2  PE3
## Min.         Min.     0.00     0.00     6.33 0.00 0.00 0.00
## 1st Qu.   1st Qu.  5425.00  2583.00  8456.58 0.01 0.01 0.01
## Median     Median 11356.00  6337.00 15958.83 0.01 0.01 0.01
## Mean         Mean 15257.96  8108.97 22470.98 0.02 0.02 0.03
## 3rd Qu.   3rd Qu. 22447.25 10052.50 24916.33 0.02 0.02 0.03
## Max.         Max. 47848.00 45258.00 84905.33 0.58 0.16 0.32
## NA's         NA's       NA       NA     4.00   NA   NA   NA

➕ Difference Calculation for PE Absorbance

# Compute difference between 565 and 564 nm absorbance for each PE dataset
PE_list <- list(PE1_tidy = PE1_tidy, PE2_tidy = PE2_tidy, PE3_tidy = PE3_tidy)
PE_list <- lapply(PE_list, function(df) {
  df %>% dplyr::mutate(diff_absorbance = X565 - X564)
})
list2env(PE_list, envir = .GlobalEnv)  # Save updated datasets to global environment
## <environment: R_GlobalEnv>

📈 Generate and Save Absorbance Difference Plots

# For each tidy PE dataset, plot `diff_absorbance` by Cell_ID and save as PNG

plots <- list()  # Initialize empty list to store plots

for (name in names(PE_list)) {
  df <- PE_list[[name]]
  
  p <- ggplot(df, aes(x = Cell_ID, y = diff_absorbance)) +
    geom_bar(stat = "identity", fill = "skyblue") +
    geom_text(aes(label = round(diff_absorbance, 3)), vjust = -0.5, size = 3) +
    theme_minimal() +
    theme(axis.text.x = element_text(angle = 90, vjust = 0.5)) +
    labs(
      title = paste("Difference between 565 and 564 Absorbance -", name),
      x = "Sample (Cell_ID)",
      y = "Difference (X565 - X564)"
    )
  
  # Save static plot as PNG (optional)
  ggsave(paste0(name, "_diff_absorbance.png"), plot = p, width = 10, height = 6, dpi = 300)
  
  # Store interactive plot in the list
  plots[[name]] <- ggplotly(p)
}

# Display all interactive plots in your HTML output
htmltools::tagList(plots)

📈 Look for interference from other spectra measured

df_ratios <- PE2_tidy %>%
  mutate(
    `X455/X564` = X455 / X564,
    `X592/X564` = X592 / X564,
    Sample = row_number()
  ) %>%
  pivot_longer(cols = c(`X455/X564`, `X592/X564`), names_to = "Ratio", values_to = "Value")

# Plot
  
ggplot(df_ratios, aes(x = Ratio, y = Value)) +
  geom_boxplot() +
  geom_jitter(width = 0.1, alpha = 0.5, color = "black") +
  geom_hline(yintercept = 1.0, linetype = "dashed", color = "red") +
  labs(
    title = "Relative Absorbance: X455 and X592 Compared to X564",
    y = "Absorbance Ratio",
    x = "Absorbance Ratio Type"
  ) +
  theme_minimal()

STEP 3: JOIN THE DATA WITH ITS SAMPLE WEIGHTS SPREADSHEET

### Description of joindf_by_id Function

The joindf_by_id function takes two data frames (df1 and df2) and merges them by matching unique identifiers related to samples, specifically using either a Cell_ID column or a plate well column. The key steps and features are:

#Set up new list of weights data to join
list_weights<- list(pe1_weights_id, pe2_weights_id, pe3_weights_id)

# Loop through both lists in parallel
mapply(function(df1, df2, name) {
  output_file <- paste0(name, "_joined.csv")
  joindf_by_id(df1, df2, output_file, key_df1 = "Cell_ID", key_df2 = "plate well")
}, df1 = PE_list, df2 = list_weights, name = names(listPE))
## 
## Values needing repetition:
## Blank01 needs to be repeated
## Lab_08 needs to be repeated
## Lab_14 needs to be repeated
## Lab_16 needs to be repeated
## Lab_20 needs to be repeated
## Lab_22 needs to be repeated
## Lab_39 needs to be repeated
## Lab_45 needs to be repeated
## Join complete. Output saved to: PE1_joined.csv
## 
## Values needing repetition:
## Blank02 needs to be repeated
## Blank03 needs to be repeated
## Lab_02 needs to be repeated
## Lab_03 needs to be repeated
## Lab_04 needs to be repeated
## Lab_14 needs to be repeated
## Lab_24 needs to be repeated
## Lab_26 needs to be repeated
## Join complete. Output saved to: PE2_joined.csv
## 
## Values needing repetition:
## juice ilo needs to be repeated
## Lima R03 oven dry needs to be repeated
## Blank_06 needs to be repeated
## Join complete. Output saved to: PE3_joined.csv
##                    PE1_tidy                       
## result_df          tbl_df,12                      
## merged_cells       88                             
## unmatched_cells    8                              
## unmatched_saved_to "unmatched_rows_PE1_joined.csv"
## assigned_to_global TRUE                           
##                    PE2_tidy                       
## result_df          tbl_df,12                      
## merged_cells       86                             
## unmatched_cells    10                             
## unmatched_saved_to "unmatched_rows_PE2_joined.csv"
## assigned_to_global TRUE                           
##                    PE3_tidy                       
## result_df          tbl_df,10                      
## merged_cells       88                             
## unmatched_cells    4                              
## unmatched_saved_to "unmatched_rows_PE3_joined.csv"
## assigned_to_global TRUE

STEP 4: CALCULATE PE CONTENTPurified phycoerythrin representation Calculating PE This function takes a tidy data frame containing absorbance measurements and calculates the concentration of phycoerythrin (PE) for each sample based on the Beer & Eshel (1985) method. It performs the following steps:

  1. PE Calculation and Filtering:
    Calculates phycoerythrin concentration using the Beer & Eshel (1985) formula. Samples with negative PE values are also removed and reported. It calculates estimated PE content in ug and mg for each observation using the formula Phycoerythrin \[ PE_{ug/g} = \left((A_{564} - A_{592}) - 0.20 \times (A_{455} - A_{592})\right) \times 0.12 \]

from Beer S, Eshel A (1985) Determining phycoerythrin and phycocyanin concentrations in aqueous crude extracts of red algae. Aust J Mar Freshwater Res 36:785–792, https://doi.org/10.1071/MF9850785.

  1. Filtering Negative PE Values:
    Samples with negative PE concentrations are identified and removed. These removed samples are printed in the console with the reason for removal.

  2. Normalization to Sample Weight:
    The PE concentration is converted from micrograms per milliliter (µg/mL) to milligrams per gram of dry sample (mg/g) by adjusting for the extract volume and the individual sample weights provided in the data frame. This ensures accurate PE quantification based on the exact weight of each sample rather than using a fixed default weight.

  3. Output and Metadata:
    The function returns a filtered data frame containing valid samples with their calculated PE concentrations in mg/g. Additionally, it stores the filtered-out rows (samples with negative PE) as an attribute called "removed_rows_pe" for further inspection if needed.

This process helps ensure the quality and accuracy of PE measurements by excluding invalid data and normalizing concentrations based on actual sample weights.

PE1_calc <- calculate_pe_and_filter(PE1_joined, sample_weight_col = "sample weight", sample_id_col="join_id")
## Removed observations due to negative PE values:
## # A tibble: 26 × 3
##    join_id PE_mg_per_mL Removal_Reason        
##    <chr>          <dbl> <chr>                 
##  1 A05       -0.000168  removed because PE < 0
##  2 A07       -0.0000960 removed because PE < 0
##  3 A11       -0.000144  removed because PE < 0
##  4 A12       -0.0000480 removed because PE < 0
##  5 B01       -0.0000960 removed because PE < 0
##  6 B05       -0.00106   removed because PE < 0
##  7 B09       -0.0000240 removed because PE < 0
##  8 C07       -0.0000240 removed because PE < 0
##  9 D10       -0.000216  removed because PE < 0
## 10 E04       -0.0000480 removed because PE < 0
## # ℹ 16 more rows
PE2_calc <- calculate_pe_and_filter(PE2_joined, sample_weight_col = "sample weight", sample_id_col="join_id")
## Removed observations due to negative PE values:
## # A tibble: 12 × 3
##    join_id PE_mg_per_mL Removal_Reason        
##    <chr>          <dbl> <chr>                 
##  1 A01       -0.000384  removed because PE < 0
##  2 A05       -0.00319   removed because PE < 0
##  3 D01       -0.0000480 removed because PE < 0
##  4 D03       -0.0000240 removed because PE < 0
##  5 E01       -0.000120  removed because PE < 0
##  6 E03       -0.0000960 removed because PE < 0
##  7 E04       -0.000552  removed because PE < 0
##  8 E06       -0.000144  removed because PE < 0
##  9 E07       -0.0000720 removed because PE < 0
## 10 E09       -0.0000480 removed because PE < 0
## 11 F01       -0.00175   removed because PE < 0
## 12 G04       -0.0000240 removed because PE < 0
PE3_calc <- calculate_pe_and_filter(PE3_joined, sample_weight_col = "sample weight", sample_id_col="join_id")
## Removed observations due to negative PE values:
## # A tibble: 12 × 3
##    join_id PE_mg_per_mL Removal_Reason        
##    <chr>          <dbl> <chr>                 
##  1 E09       -0.0000720 removed because PE < 0
##  2 E10       -0.0000720 removed because PE < 0
##  3 E12       -0.0000720 removed because PE < 0
##  4 F03       -0.000216  removed because PE < 0
##  5 F05       -0.0000240 removed because PE < 0
##  6 G02       -0.000120  removed because PE < 0
##  7 G04       -0.0000960 removed because PE < 0
##  8 G05       -0.0000720 removed because PE < 0
##  9 G12       -0.0000720 removed because PE < 0
## 10 H03       -0.0000480 removed because PE < 0
## 11 H09       -0.0000480 removed because PE < 0
## 12 H10       -0.0000240 removed because PE < 0

STEP 5: FLUORESENCE DATA VISUALIZATION

fluor_list <- list(
  Fluor1 = Fluor1_tidy,
  Fluor2 = Fluor2_tidy,
  Fluor3 = Fluor3_tidy
)

library(ggplot2)
library(plotly)
library(htmltools)

plots <- list()  # Initialize empty list to store interactive plots

for (name in names(fluor_list)) {
  df <- fluor_list[[name]]
  
  p <- ggplot(df, aes(x = Cell_ID, y = Xred)) +
    geom_bar(stat = "identity", fill = "skyblue") +
    geom_text(aes(label = round(Xred, 3)), vjust = -0.5, size = 3) +
    theme_minimal() +
    theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)) +
    labs(
      title = paste("530/25,590/35 -", name),
      x = "Sample (Cell_ID)",
      y = "Fluorescence"
    )
  
  # Save static version of plot (optional)
  filename <- paste0(name, "_fluorescence.png")
  ggsave(filename, plot = p, width = 10, height = 6, dpi = 300)
  
  # Store the interactive version of the plot
  plots[[name]] <- ggplotly(p)
}

# Display all interactive plots together in the HTML output
htmltools::tagList(plots)

STEP 6: FLUORESENCE DATA JOIN

Lets join this data to the PE data that has already been joined with the weights and id’s table.

joindf_by_id(PE1_calc, Fluor1_tidy, "pe_fluor1.csv", key_df1 = "join_id", key_df2 = "Cell_ID")
## 
## Values needing repetition:
## A01 needs to be repeated
## A05 needs to be repeated
## A07 needs to be repeated
## A11 needs to be repeated
## A12 needs to be repeated
## B01 needs to be repeated
## B05 needs to be repeated
## B09 needs to be repeated
## C01 needs to be repeated
## C07 needs to be repeated
## D07 needs to be repeated
## D10 needs to be repeated
## E01 needs to be repeated
## E04 needs to be repeated
## E08 needs to be repeated
## E09 needs to be repeated
## E10 needs to be repeated
## E11 needs to be repeated
## F01 needs to be repeated
## F07 needs to be repeated
## F08 needs to be repeated
## G01 needs to be repeated
## G02 needs to be repeated
## G03 needs to be repeated
## G04 needs to be repeated
## G05 needs to be repeated
## G06 needs to be repeated
## G07 needs to be repeated
## G09 needs to be repeated
## G10 needs to be repeated
## H05 needs to be repeated
## H07 needs to be repeated
## H08 needs to be repeated
## H12 needs to be repeated
## Join complete. Output saved to: pe_fluor1.csv
## $result_df
## # A tibble: 62 × 16
##    join_id  X455  X564   X592    X618    X645   X565 diff_absorbance ID    
##    <chr>   <dbl> <dbl>  <dbl>   <dbl>   <dbl>  <dbl>           <dbl> <chr> 
##  1 A02     0.03  0.018 0.012  0.012   0.009   0.017         -0.00100 Lab_01
##  2 A03     0.029 0.019 0.012  0.011   0.008   0.019          0       Lab_01
##  3 A04     0.022 0.013 0.009  0.009   0.007   0.013          0       Lab_01
##  4 A06     0.015 0.007 0.0050 0.00400 0.00200 0.0050        -0.00200 Lab_02
##  5 A08     0.015 0.009 0.006  0.0050  0.00300 0.008         -0.00100 Lab_03
##  6 A09     0.029 0.012 0.007  0.007   0.00300 0.01          -0.00200 Lab_03
##  7 A10     0.063 0.053 0.044  0.04    0.035   0.048         -0.00500 Lab_03
##  8 B02     0.018 0.008 0.0050 0.00400 0.00300 0.008          0       Lab_05
##  9 B03     0.028 0.016 0.013  0.012   0.01    0.016          0       Lab_05
## 10 B04     0.019 0.008 0.0050 0.00400 0.00300 0.008          0       Lab_05
## # ℹ 52 more rows
## # ℹ 7 more variables: `sample weight` <dbl>, `Tray well` <chr>, date <dttm>,
## #   PE_mg_per_mL <dbl>, total_PE_mg <dbl>, PE_mg_per_g_sample <dbl>, Xred <dbl>
## 
## $merged_cells
## [1] 62
## 
## $unmatched_cells
## [1] 34
## 
## $unmatched_saved_to
## [1] "unmatched_rows_pe_fluor1.csv"
## 
## $assigned_to_global
## [1] TRUE
joindf_by_id(PE2_calc, Fluor2_tidy, "pe_fluor2.csv", key_df1 = "join_id", key_df2 = "Cell_ID")
## 
## Values needing repetition:
## A01 needs to be repeated
## A03 needs to be repeated
## A05 needs to be repeated
## A07 needs to be repeated
## A09 needs to be repeated
## A11 needs to be repeated
## B03 needs to be repeated
## D01 needs to be repeated
## D03 needs to be repeated
## D07 needs to be repeated
## E01 needs to be repeated
## E03 needs to be repeated
## E04 needs to be repeated
## E06 needs to be repeated
## E07 needs to be repeated
## E09 needs to be repeated
## F01 needs to be repeated
## G02 needs to be repeated
## G03 needs to be repeated
## G04 needs to be repeated
## G07 needs to be repeated
## Join complete. Output saved to: pe_fluor2.csv
## $result_df
## # A tibble: 74 × 16
##    join_id    X564     X592   X455    X565 diff_absorbance ID    `sample weight`
##    <chr>     <dbl>    <dbl>  <dbl>   <dbl>           <dbl> <chr>           <dbl>
##  1 A04     0.0137  0.00967  0.0167 0.0127         -0.00100 Lab_…            24.5
##  2 A06     0.0103  0.00433  0.0163 0.0103          0       Lab_…            24.1
##  3 A08     0.00367 0.000667 0.0127 0.00367         0       Lab_…            23.7
##  4 A10     0.0103  0.00533  0.0213 0.00933        -0.00100 Lab_…            23  
##  5 A12     0.00867 0.00267  0.0127 0.00867         0       Lab_…            26.9
##  6 B01     0.0163  0.00733  0.0213 0.0153         -0.00100 Lab_…            23.6
##  7 B02     0.00933 0.00533  0.0253 0.00933         0       Lab_…            24.2
##  8 B04     0.00467 0.00167  0.0117 0.00467         0       Lab_…            26.4
##  9 B05     0.0123  0.00733  0.0223 0.0113         -0.00100 Lab_…            24.4
## 10 B06     0.0643  0.0533   0.0763 0.0603         -0.00400 Lab_…            24.4
## # ℹ 64 more rows
## # ℹ 8 more variables: `Tray well` <chr>, date <dttm>, ...6 <lgl>, ...7 <chr>,
## #   PE_mg_per_mL <dbl>, total_PE_mg <dbl>, PE_mg_per_g_sample <dbl>, Xred <dbl>
## 
## $merged_cells
## [1] 74
## 
## $unmatched_cells
## [1] 21
## 
## $unmatched_saved_to
## [1] "unmatched_rows_pe_fluor2.csv"
## 
## $assigned_to_global
## [1] TRUE
joindf_by_id(PE3_calc, Fluor3_tidy, "pe_fluor3.csv", key_df1 = "join_id", key_df2 = "Cell_ID")
## 
## Values needing repetition:
## E06 needs to be repeated
## E07 needs to be repeated
## E08 needs to be repeated
## E11 needs to be repeated
## F01 needs to be repeated
## F04 needs to be repeated
## F06 needs to be repeated
## F07 needs to be repeated
## F08 needs to be repeated
## F09 needs to be repeated
## F10 needs to be repeated
## F11 needs to be repeated
## F12 needs to be repeated
## G03 needs to be repeated
## G06 needs to be repeated
## G07 needs to be repeated
## G08 needs to be repeated
## G09 needs to be repeated
## G10 needs to be repeated
## G11 needs to be repeated
## H01 needs to be repeated
## H02 needs to be repeated
## H04 needs to be repeated
## H05 needs to be repeated
## H06 needs to be repeated
## H07 needs to be repeated
## H08 needs to be repeated
## H12 needs to be repeated
## Join complete. Output saved to: pe_fluor3.csv
## $result_df
## # A tibble: 58 × 14
##    join_id   Xred   X564   X565    X592   X455 diff_absorbance ID    
##    <chr>    <dbl>  <dbl>  <dbl>   <dbl>  <dbl>           <dbl> <chr> 
##  1 A01      8004. 0.0193 0.0193 0.0123  0.0343         0       Lab_40
##  2 A02      7147. 0.0133 0.0133 0.00633 0.0253         0       Lab_40
##  3 A03     10370. 0.022  0.022  0.012   0.039          0       Lab_40
##  4 A04     20048. 0.0193 0.0193 0.00433 0.0243         0       Lab_41
##  5 A05     21192. 0.0283 0.0273 0.0123  0.0263        -0.00100 Lab_41
##  6 A06     22423. 0.0343 0.0333 0.0153  0.0503        -0.00100 Lab_41
##  7 A07     12791. 0.018  0.018  0.007   0.028          0       Lab_42
##  8 A08     14773. 0.0163 0.0163 0.00333 0.0233         0       Lab_42
##  9 A09     14830. 0.0223 0.0223 0.0103  0.0323         0       Lab_42
## 10 A10     37830. 0.0603 0.0593 0.0243  0.0403        -0.00100 Lab_43
## # ℹ 48 more rows
## # ℹ 6 more variables: `sample weight` <dbl>, `Tray well` <chr>, date <dttm>,
## #   PE_mg_per_mL <dbl>, total_PE_mg <dbl>, PE_mg_per_g_sample <dbl>
## 
## $merged_cells
## [1] 52
## 
## $unmatched_cells
## [1] 28
## 
## $unmatched_saved_to
## [1] "unmatched_rows_pe_fluor3.csv"
## 
## $assigned_to_global
## [1] TRUE
pe_fluor_all <- list(pe_fluor1, pe_fluor2, pe_fluor3)

# Add a run column based on list position
pe_fluor_all <- Map(function(df, i) {
  df$run <- factor(paste0("run", i))
  df
}, pe_fluor_all, seq_along(pe_fluor_all))

common_cols <- Reduce(intersect, lapply(pe_fluor_all, names))

# Keep only the common columns in each dataframe before binding
combined_df <- bind_rows(
  lapply(pe_fluor_all, function(df) df[common_cols]),
  .id = "run"
)
#Finally merge all spreadsheets into one for replicate testing and analysis
#Scale PE bigger for regression purposes
combined_df <- combined_df %>%
  mutate(PE_scaled = PE_mg_per_g_sample * 1000)

1️⃣ Full Dataset Plot with Points Colored by Run

p_all <- ggplot(combined_df, aes(x = PE_mg_per_g_sample, y = Xred, color = run)) +
  geom_point(alpha = 0.6) +
  theme_minimal() +
  labs(title = "Xred vs PE_mg_per_g_sample (All Runs)",
       x = "PE (mg/g sample)",
       y = "Xred Fluorescence") +
  scale_color_brewer(palette = "Dark2")

ggsave("Xred_PE_all_runs.png", plot = p_all, width = 10, height = 6, dpi = 300)
## Warning: Removed 10 rows containing missing values or values outside the scale range
## (`geom_point()`).
print(p_all)
## Warning: Removed 10 rows containing missing values or values outside the scale range
## (`geom_point()`).

# Remove unwanted rows first
combined_df <- combined_df[-(190:194), ]

# Add new factor level before assignment
levels(combined_df$run) <- c(levels(combined_df$run), "run 3 fresh")

# Assign new factor value to rows 169 to 189 (adjusted for new row count)
combined_df$run[169:189] <- "run 3 fresh"

2️⃣ Loop Through Each Run and Save Regressions

runs <- levels(combined_df$run)
for (r in runs) {
  df_sub <- combined_df %>% filter(run == r)

  if (nrow(df_sub) < 4) {
    warning(paste("Skipping run", r, "- not enough data"))
    next
  }
#df_sub <- combined_df %>% filter(run == "run1")
  # Fit models
  model_linear <- lm(Xred ~ PE_scaled, data = df_sub)
  model_log    <- lm(Xred ~ log(PE_scaled + 0.001), data = df_sub)
  model_poly   <- lm(Xred ~ PE_scaled + I(PE_scaled^2), data = df_sub)

  # Get annotation
  ann_linear <- get_stats(model_linear, "Linear")
  ann_log    <- get_stats(model_log, "Log")
  ann_poly   <- get_stats(model_poly, "Poly")

  # Plot
  p <- ggplot(df_sub, aes(x = PE_scaled, y = Xred)) +
    geom_point(alpha = 0.6, color = "black") +
    geom_text_repel(aes(label = join_id), size = 2, max.overlaps = 50) +
    stat_smooth(method = "lm", formula = y ~ x, se = FALSE, color = "blue") +
    stat_smooth(method = "lm", formula = y ~ log(x + 0.001), se = FALSE, color = "green", linetype = "dashed") +
    stat_smooth(method = "lm", formula = y ~ x + I(x^2), se = FALSE, color = "red", linetype = "dotdash") +
    annotate("text", x = Inf, y = Inf, label = ann_linear, hjust = 1.05, vjust = 2, color = "blue", size = 4) +
    annotate("text", x = Inf, y = Inf, label = ann_log, hjust = 1.05, vjust = 3.5, color = "green", size = 4) +
    annotate("text", x = Inf, y = Inf, label = ann_poly, hjust = 1.05, vjust = 5, color = "red", size = 4) +
    theme_minimal() +
    labs(title = paste("Regression Models: Xred vs PE (", r, ")"),
         x = "PE (mg/g sample)",
         y = "Xred Fluorescence")

  # Save plot
  filename <- paste0("Xred_PE_regressions_", r, ".png")
  ggsave(filename, plot = p, width = 10, height = 6, dpi = 300)
  plot(p)
}
## Warning: Removed 6 rows containing non-finite outside the scale range (`stat_smooth()`).
## Removed 6 rows containing non-finite outside the scale range (`stat_smooth()`).
## Removed 6 rows containing non-finite outside the scale range (`stat_smooth()`).
## Warning: Removed 6 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 6 rows containing missing values or values outside the scale range
## (`geom_text_repel()`).
## Warning: Removed 6 rows containing non-finite outside the scale range (`stat_smooth()`).
## Removed 6 rows containing non-finite outside the scale range (`stat_smooth()`).
## Removed 6 rows containing non-finite outside the scale range (`stat_smooth()`).
## Warning: Removed 6 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 6 rows containing missing values or values outside the scale range
## (`geom_text_repel()`).

Function: analyze_replicates

🔍 analyze_replicates() Function This function processes replicate data for each sample, calculates summary statistics, and optionally selects the best 3 replicates that minimize the coefficient of variation (CV) to reduce the effect of outliers or noisy data.

Key Features: Input: A data frame with replicate observations and associated metadata (e.g., sample weight, join_id, date, etc.).

Output: A summary table with per-sample means, standard deviations, standard errors, CVs, and maximum deviation percentages for each numeric variable.

Optional Filtering: When choose_best_3 = TRUE, the function:

Evaluates all combinations of 3 replicates,

Selects the combination with the lowest CV,

Computes statistics only on those 3 replicates.

Metadata Summary: Also calculates:

The number of replicates per sample,

Average sample weight,

Replication dates,

A list of included/excluded rows (if applicable).

Output File: A wide-format CSV file (e.g., replicate_analysis_summary.csv) containing one row per sample and all summary statistics.

This code chunk runs a function to analyze replicate measurements for each sample in combined_df, selecting the best 3 replicates based on consistency. It outputs summary statistics with the prefix “E_rep_analy” and updates the main dataframe. Then, it generates histograms with error bars for key variables (PE_ug_per_g, Xred) to visualize the distribution and variation across samples.

#combine technical replicates
analyze_replicates(combined_df,
                  id_col = "ID",
                  join_col = "join_id",
                               weight_col = "sample weight",
                               date_col = "date",
                               output_prefix = "all_rep_analy",
                                choose_best_3 = FALSE)
## Summary saved to: all_rep_analy_summary.csv
## # A tibble: 51 × 65
##    ID          PE_mg_per_g_sample_m…¹ PE_mg_per_mL_mean PE_scaled_mean X455_mean
##    <fct>                        <dbl>             <dbl>          <dbl>     <dbl>
##  1 juice fres…                0.217           0.00276           217.      0.0243
##  2 juice fres…                0.362           0.00335           362.      0.0262
##  3 juice ilo                  0.0613          0.000624           61.3     0.0103
##  4 Lab_01                     0.0346          0.000326           34.6     0.0228
##  5 Lab_02                     0.00228         0.0000360           2.28    0.0138
##  6 Lab_03                     0.0328          0.000307           32.8     0.0282
##  7 Lab_04                     0.0236          0.000372           23.6     0.0233
##  8 Lab_05                     0.0132          0.000200           13.2     0.0292
##  9 Lab_06                     0.0860          0.000869           86.0     0.173 
## 10 Lab_07                     0.0535          0.000562           53.5     0.0195
## # ℹ 41 more rows
## # ℹ abbreviated name: ¹​PE_mg_per_g_sample_mean
## # ℹ 60 more variables: X564_mean <dbl>, X565_mean <dbl>, X592_mean <dbl>,
## #   Xred_mean <dbl>, diff_absorbance_mean <dbl>, total_PE_mg_mean <dbl>,
## #   PE_mg_per_g_sample_sd <dbl>, PE_mg_per_mL_sd <dbl>, PE_scaled_sd <dbl>,
## #   X455_sd <dbl>, X564_sd <dbl>, X565_sd <dbl>, X592_sd <dbl>, Xred_sd <dbl>,
## #   diff_absorbance_sd <dbl>, total_PE_mg_sd <dbl>, …
#combine technical replicates choose best three 
analyze_replicates(combined_df,
                  id_col = "ID",
                  join_col = "join_id",
                               weight_col = "sample weight",
                               date_col = "date",
                               output_prefix = "E_rep_analy",
                                choose_best_3 = TRUE)
## Summary saved to: E_rep_analy_summary.csv
## # A tibble: 51 × 85
##    ID          PE_mg_per_g_sample_m…¹ PE_mg_per_mL_mean PE_scaled_mean X455_mean
##    <fct>                        <dbl>             <dbl>          <dbl>     <dbl>
##  1 juice fres…                0.217           0.00276           217.      0.0243
##  2 juice fres…                0.362           0.00335           362.      0.0262
##  3 juice ilo                  0.0613          0.000624           61.3     0.0103
##  4 Lab_01                     0.0229          0.000392           22.9     0.027 
##  5 Lab_02                     0.00228         0.0000360           2.28    0.0138
##  6 Lab_03                     0.0204          0.00044            20.4     0.0163
##  7 Lab_04                     0.0236          0.000372           23.6     0.0233
##  8 Lab_05                     0.00569         0.000136            5.69    0.0198
##  9 Lab_06                     0.0641          0.00127            64.1     0.0523
## 10 Lab_07                     0.0735          0.000616           73.5     0.0178
## # ℹ 41 more rows
## # ℹ abbreviated name: ¹​PE_mg_per_g_sample_mean
## # ℹ 80 more variables: X564_mean <dbl>, X565_mean <dbl>, X592_mean <dbl>,
## #   Xred_mean <dbl>, diff_absorbance_mean <dbl>, total_PE_mg_mean <dbl>,
## #   PE_mg_per_g_sample_sd <dbl>, PE_mg_per_mL_sd <dbl>, PE_scaled_sd <dbl>,
## #   X455_sd <dbl>, X564_sd <dbl>, X565_sd <dbl>, X592_sd <dbl>, Xred_sd <dbl>,
## #   diff_absorbance_sd <dbl>, total_PE_mg_sd <dbl>, …
#Make a bunch of graphs 🖨️🖨️🖨️
#final_var <- c("PE_mg_per_gram_sample_mean", "X565_mean", "PE_ug_per_gram_sample_cv", "Xred_mean", "Xred_mean")

graph_histograms_with_error(
  data = combined_df,
  variables = c("PE_mg_per_g_sample", "Xred"),
  id_col = "ID"
)
## Saving 7 x 5 in image
## Saving 7 x 5 in image
## Plots saved to folder: histogram_analysis_plots
## $PE_mg_per_g_sample
## 
## $Xred

Check effects of enhancement algorithm

PErep_enhanced <- read_csv("E_rep_analy_summary.csv") #retrieve analized replicates from enhanced CV 
## Rows: 51 Columns: 85
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (3): ID, join_ids, replicate_date
## dbl (62): PE_mg_per_g_sample_mean, PE_mg_per_mL_mean, PE_scaled_mean, X455_m...
## num (20): PE_mg_per_g_sample_included_rows, PE_mg_per_mL_included_rows, PE_s...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
PErep_all <- read_csv("all_rep_analy_summary.csv") #retrieve analized replicates without enhancement 
## Rows: 51 Columns: 65
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (3): ID, join_ids, replicate_date
## dbl (62): PE_mg_per_g_sample_mean, PE_mg_per_mL_mean, PE_scaled_mean, X455_m...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#Note, this is a product of analyze_replicates using enhanced replicate selection.
SE_change <- PErep_enhanced$PE_mg_per_g_sample_max_dev_pct -PErep_all$PE_mg_per_g_sample_max_dev_pct
print(SE_change)
##  [1]    0.0000000    0.0000000    0.0000000  -64.2502175    0.0000000
##  [6] -151.0746796    0.0000000 -222.4021594 -123.2779150  -50.8332025
## [11]  -56.5204226  -85.1448923  -50.2539810  -67.2754758  -73.3978154
## [16]  -52.2745398   -0.1267003  -55.1960482    0.0000000    0.0000000
## [21]  -65.7598339    0.0000000    0.0000000  -48.5333937  -43.4997256
## [26]    0.0000000    0.0000000    0.0000000    0.0000000    0.0000000
## [31]    0.0000000    0.0000000  -72.5501845  -56.8479013    0.0000000
## [36]    0.0000000  -56.0968444  -70.4508636  -66.0218326  -49.2541002
## [41]  -82.4368101  -24.2029729    0.0000000    0.0000000    0.0000000
## [46]    0.0000000    0.0000000    0.0000000    0.0000000    0.0000000
## [51]          NaN
SE_change <- SE_change[-51] #remove one NaN at the end
paste("# number of replicates SE acted upon:", length(SE_change[SE_change != 0]))
## [1] "# number of replicates SE acted upon: 24"
paste("average improvement by max deviation as a percent of the mean:", abs(mean(SE_change[SE_change != 0], na.rm = TRUE)))
## [1] "average improvement by max deviation as a percent of the mean: 70.32010467602"
joindf_by_id(PErep_enhanced, `Sample data`, "PErep_final.csv", key_df1 = "ID", key_df2 = "ID")
## 
## Values needing repetition:
## Lab_27 needs to be repeated
## Lab_28 needs to be repeated
## Lab_29 needs to be repeated
## Lab_32 needs to be repeated
## Lab_47 needs to be repeated
## Lab_52 needs to be repeated
## Lab_53 needs to be repeated
## Lab_54 needs to be repeated
## Lab_55 needs to be repeated
## Lab_56 needs to be repeated
## Lab_57 needs to be repeated
## Lab_58 needs to be repeated
## Lab_59 needs to be repeated
## Lab_60 needs to be repeated
## Lab_61 needs to be repeated
## Lab_62 needs to be repeated
## Lab_63 needs to be repeated
## Lab_64 needs to be repeated
## Join complete. Output saved to: PErep_final.csv
## $result_df
## # A tibble: 51 × 94
##    join_id     PE_mg_per_g_sample_m…¹ PE_mg_per_mL_mean PE_scaled_mean X455_mean
##    <chr>                        <dbl>             <dbl>          <dbl>     <dbl>
##  1 juice fres…                0.217           0.00276           217.      0.0243
##  2 juice fres…                0.362           0.00335           362.      0.0262
##  3 juice ilo                  0.0613          0.000624           61.3     0.0103
##  4 Lab_01                     0.0229          0.000392           22.9     0.027 
##  5 Lab_02                     0.00228         0.0000360           2.28    0.0138
##  6 Lab_03                     0.0204          0.00044            20.4     0.0163
##  7 Lab_04                     0.0236          0.000372           23.6     0.0233
##  8 Lab_05                     0.00569         0.000136            5.69    0.0198
##  9 Lab_06                     0.0641          0.00127            64.1     0.0523
## 10 Lab_07                     0.0735          0.000616           73.5     0.0178
## # ℹ 41 more rows
## # ℹ abbreviated name: ¹​PE_mg_per_g_sample_mean
## # ℹ 89 more variables: X564_mean <dbl>, X565_mean <dbl>, X592_mean <dbl>,
## #   Xred_mean <dbl>, diff_absorbance_mean <dbl>, total_PE_mg_mean <dbl>,
## #   PE_mg_per_g_sample_sd <dbl>, PE_mg_per_mL_sd <dbl>, PE_scaled_sd <dbl>,
## #   X455_sd <dbl>, X564_sd <dbl>, X565_sd <dbl>, X592_sd <dbl>, Xred_sd <dbl>,
## #   diff_absorbance_sd <dbl>, total_PE_mg_sd <dbl>, …
## 
## $merged_cells
## [1] 44
## 
## $unmatched_cells
## [1] 18
## 
## $unmatched_saved_to
## [1] "unmatched_rows_PErep_final.csv"
## 
## $assigned_to_global
## [1] TRUE
graph_replicates_custom_error(
  data = PErep_final,
  id_col = "join_id",
  value_col = "PE_mg_per_g_sample_mean",
  se_col = "PE_mg_per_g_sample_se",
  output_prefix = "E_rep_analy"
)
## Saving 7 x 5 in image
## Plot saved to: E_rep_analy_plots/PE_mg_per_g_sample_mean_replicates.png
compare_groups(PErep_final, response_var = "PE_mg_per_g_sample_mean", group_var = "Location") #PE data by location
## 
## Attaching package: 'rlang'
## The following objects are masked from 'package:purrr':
## 
##     %@%, flatten, flatten_chr, flatten_dbl, flatten_int, flatten_lgl,
##     flatten_raw, invoke, splice
## Parametric test result (ANOVA):
##             Df  Sum Sq  Mean Sq F value  Pr(>F)   
## Location     9 0.07231 0.008035   3.511 0.00363 **
## Residuals   34 0.07781 0.002289                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 7 observations deleted due to missingness
## Non-parametric test result (Kruskal-Wallis test):
## 
##  Kruskal-Wallis rank sum test
## 
## data:  PE_mg_per_g_sample_mean by Location
## Kruskal-Wallis chi-squared = 17.66, df = 9, p-value = 0.03933
## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_boxplot()`).
## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_compare_means()`).
## Removed 1 row containing non-finite outside the scale range
## (`stat_compare_means()`).
## Warning: Removed 7 rows containing missing values or values outside the scale range
## (`geom_point()`).

compare_groups(PErep_final, response_var = "Xred_mean", group_var = "Location") #Fluoresence by location
## Parametric test result (ANOVA):
##             Df    Sum Sq   Mean Sq F value   Pr(>F)    
## Location     9 3.816e+09 423961246   4.744 0.000399 ***
## Residuals   34 3.039e+09  89367983                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 7 observations deleted due to missingness
## Non-parametric test result (Kruskal-Wallis test):
## 
##  Kruskal-Wallis rank sum test
## 
## data:  Xred_mean by Location
## Kruskal-Wallis chi-squared = 10.857, df = 9, p-value = 0.2857
## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_boxplot()`).
## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_compare_means()`).
## Removed 1 row containing non-finite outside the scale range
## (`stat_compare_means()`).
## Warning: Removed 7 rows containing missing values or values outside the scale range
## (`geom_point()`).

compare_groups(PErep_final, response_var = "PE_mg_per_g_sample_mean", group_var = "variety") #PE data by variety
## Parametric test result (t-test):
## 
##  Welch Two Sample t-test
## 
## data:  PE_mg_per_g_sample_mean by variety
## t = 2.7436, df = 41.298, p-value = 0.008949
## alternative hypothesis: true difference in means between group C.Cham and group F. Glom is not equal to 0
## 95 percent confidence interval:
##  0.01003143 0.06594116
## sample estimates:
##  mean in group C.Cham mean in group F. Glom 
##            0.06949904            0.03151275
## Non-parametric test result (Wilcoxon test):
## 
##  Wilcoxon rank sum exact test
## 
## data:  PE_mg_per_g_sample_mean by variety
## W = 271, p-value = 0.03735
## alternative hypothesis: true location shift is not equal to 0
## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_boxplot()`).
## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_compare_means()`).
## Removed 1 row containing non-finite outside the scale range
## (`stat_compare_means()`).
## Warning: Removed 7 rows containing missing values or values outside the scale range
## (`geom_point()`).

compare_groups(PErep_final, response_var = "PE_mg_per_g_sample_mean", group_var = "Life_S") #PE data by life stage
## Parametric test result (ANOVA):
##             Df  Sum Sq  Mean Sq F value Pr(>F)
## Life_S       4 0.00874 0.002184   0.568  0.687
## Residuals   35 0.13453 0.003844               
## 11 observations deleted due to missingness
## Non-parametric test result (Kruskal-Wallis test):
## 
##  Kruskal-Wallis rank sum test
## 
## data:  PE_mg_per_g_sample_mean by Life_S
## Kruskal-Wallis chi-squared = 4.1876, df = 4, p-value = 0.3812
## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_boxplot()`).
## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_compare_means()`).
## Removed 1 row containing non-finite outside the scale range
## (`stat_compare_means()`).
## Warning: Removed 11 rows containing missing values or values outside the scale range
## (`geom_point()`).